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Abstract 

This paper explores the relationship between CH — h templates and partial evaluation. 
Templates were designed to support generic programming, but unintentionally provided 
the ability to perform compile-time computations and code generation. These features 
are completely accidental, and as a result their syntax is awkward. By recasting these 
features in terms of partial evaluation, a much simpler syntax can be achieved. CH — h 
may be regarded as a two-level language in which types are first-class values. Template 
instantiation resembles an offline partial evaluator. This paper describes preliminary 
work toward a single mechanism based on Partial Evaluation which unifies generic 
programming, compile-time computation and code generation. The language Catat is 
introduced to illustrate these ideas. 

1 Introduction 

Templates were added to the C++ language to support generic programming. However, 
their addition unintentionally introduced powerful mechanisms for compile-time computa- 
tion and code generation. These mechanisms have proven themselves very useful in gener- 



ating optimized code for scientific computing applications [13, 16, 22, 23 1. Since they are 
accidental features, their syntax is somewhat awkward. The goal of this paper is to achieve 
a simpler syntax by recasting these features as partial evaluation. We start by briefly 
summarizing the capabilities provided by C++ templates, both intended and accidental. 

1.1 Generic programming 

The original intent of templates was to support generic programming, which can be 
summarized as "reuse through parameterization". Generic functions and objects have 
parameters which customize their behavior. These parameters must be known at compile 
time (i.e. have static binding). For example, a generic vector class can be declared as: 

1 template<typename T, int N> 
class Vector { 

// some member functions here. . . 

5 private : 

T data[N] ; 

}; 

// Example use of Vector 
10 Vector<int ,4> x; 
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The Vector class takes two template parameters (line 1): T, a type parameter, specifies the 
element type for the vector; N, a nontype parameter, is the length of the vector. To use the 
vector class, template arguments must be provided (line 10). This causes the template to 
be instantiated: an instance of the template is created by replacing all occurrences of T and 
N in the definition of Vector with int and 4, respectively. 

Functions may also be templates. Here is a function template which sums the elements 
of an array: 

template<typename T> 

T sum(T* array, int numElements) 

{ 

T result = 0; 

for (int i=0; i < numElements; ++i) 

result += array [i] ; 
return result; 

} 

This function works for built-in types, such as int and float, and also for user-defined types 
provided they have appropriate operators (=, +=) defined. Templates allow programmers 
to develop classes and functions which are very customizable, yet retain the efficiency of 
statically configured code. 

1.2 Compile-time computations 

Templates can be exploited to perform computations at compile time. This was discovered 
by Erwin Unruh who wrote a program which produced these errors at compile time: 

erwin.cpp 10: Cannot convert 'enum' to 'D<2>' 

erwin. cpp 10: Cannot convert 'enum' to 'D<3>' 

erwin.cpp 10: Cannot convert 'enum' to 'D<5>' 

erwin.cpp 10: Cannot convert 'enum' to 'D<7>' 

erwin.cpp 10: Cannot convert 'enum' to 'D<11>' 

The program tricked the compiler into calculating a list of prime numbers! This capability 
was quite accidental, but has turned out to be very useful. Here is a simpler example which 
calculates pow(X,Y) at compile time: 

template<int X, int Y> 
struct ctime_pow { 

static const int result = X * ctime_pow<X,Y-l>: :result; 

}; 

// Base case to terminate recursion 

template<int X> 

struct ctime_pow<X, 1> { 

static const int result = X; 

}; 

// Example use: 

const int z = ctime_pow<5 , 3> :: result ; // z = 125 

The first template defines a structure ctime_pow which has a single data member 
result. The static const qualifiers of result make its value available at compile time. 
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ctime_pow<X,Y> refers to ctime_pow<X,Y-l>, so the compiler must recursively instantiate 
the template for Y,Y-1, Y-2, ... until it hits the base case provided by the second template, 
which is a partial specialization. 

Here is an array class which uses ctime_pow to calculate the number of array elements 
needed: 

template<typename T_numtype , int N_length, int N_dim> 
class SquareArray { 
// ... 

static const int numElements = ctime_pow<N_length,N_dim> : :result; 
T_numtype data [numElements] ; 

} 

// Example use: 

SquareArray<float,4,2> x; Ilk 4x4 array: will have 16 elements 
SquareArray<f loat ,4,3> x; Ilk 4x4x4 array: will have 64 elements 

When the SquareArray template is instantiated, ctime_pow is used to calculate the array 
size required. Similar techniques can be used to find greatest common divisors, test for 
primality, and so on. It is even possible to implement an interpreter for a subset of Lisp 
which runs at compile time [||. 

1.3 Code generation 

It turns out that compile-time versions of flow control structures (loops, if/else, case 
switches) can all be implemented in terms of templates. For example, the definition of 



ctime_pow (Section L2) emulates a for loop using tail recursion. These compile-time 
programs can perform code generation by selectively inlining code as they are "interpreted" 
by the compiler. This technique is called template metaprogramming Here is a template 
metaprogram which generates a specialized dot product algorithm: 

template<typename T, int I, int N> 
struct meta_dot { 

static inline T f (T* a, T* b) 

{ return meta_dot<T, 1-1 ,N> : :f (a, b) + a[I]*b[I]; } 

}: 

template<class T, int N> 
struct meta_dot<T,0,N> { 

static inline T f (T* a, T* b) 

{ return a[0]*b[0] ; } 

}; 

// Example use : 
float x[3] , y [3] ; 

float z = meta_dot<float,2,3>: :f (x,y) ; // ** 

In the above example, the call to meta_dot in line marked ** results in code equivalent to: 

float z = a[0]*b[0] + a[l]*b[l] + a[2]*b[2]; 

Head recursion is used to unroll the loop over the vector elements. The syntax for writing 
such code generators is clumsy. However, the technique has proven very useful in producing 
specialized algorithms for scientific computing (l^, |23| . 



4 



It is even possible to create and manipulate static data structures at compile time, by 
encoding them as templates. This is the basis of the expression templates technique |20| . 
which creates parse trees of array expressions at compile time. These parse trees are used to 
generate efficient evaluation routines for array expressions. This technique is the backbone 
of several libraries for object-oriented numerics |22j . 



1.4 Traits 



The traits technique [14| allows programmers to define "functions" which operate on and 
return types rather than data. As a motivating example, consider a generic function which 
calculates the average value of an array. What should its return type be? If the array 
contains integers, a floating-point result should be returned. But a floating-point return 
type obviously will not suffice for a complex-valued array. 

The solution is to define a traits class which maps from the type of the array elements 
to a type suitable for containing their average. Here is a simple implementation: 



template<typename T> 
struct average_traits { 

typedef T T_average; // default behaviour: T -> T 

}; 



templateO 

struct average_traits<int> { 

typedef float T_average; // specialization: int -> float 

}; 

An appropriate type for averaging an array of type T is given by 
average_traits<T> : :T_average. Here is an implementation of average: 

template<class T> 

typename average_traits<T> : :T^average averaged* array, int N) 

{ 

typename average_traits<T> : : T^average result = sum(array ,N) ; 
return result / N; 

} 



2 Templates as partial evaluation 

Partial evaluators ]l2| regard a program's data as containing two subsets: static data, which 
is known at compile time, and dynamic data, which is not known until run time. A partial 
evaluator evaluates as much of a program as possible (using the static data) and outputs a 
specialized residual program. 

To determine which portions of a program may be evaluated, a partial evaluator 
performs binding time analysis to label language constructs and data as static or dynamic. 
Such a labelled language is called a two-level language. For example, a binding-time analysis 
of some scientific computing code might produce this two-level code fragment: 

float volumeOf Cube (float length) 
{ 

return pow(length, 3) ; 

} 
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float pow(float x, int N ) 
{ 

float y = 1; 

for ( int i=0 ; i < N ; ++i) 

y *= x; 
return y; 

} 

in which static constructs have been underlined. A partial evaluator such as CMix |lj would 
evaluate the static constructs to produce the residual code: 

float volumeOf Cube (float length) 

{ 

return pow3(length) ; 

} 

float pow3 (float x) 
{ 

float y = 1; 
y *= x; 
y *= x; 
y *= x; 

} 

Such specializations can result in substantial performance improvements for scientific code 
§0- 



2.1 C-\ — h as a two-level language 

C++ templates resemble a two-level language. Function templates take both template 
parameters (which have static binding) and function arguments (which have dynamic 
binding). For example, the pow function of the previous example might be declared in 
C++ as: 

template<int N> 

float pow(float x) ; // Calculate pow(x,N) 

The static data (N) is a template parameter, and the dynamic data (x) is a function 
argument. To incorporate template type parameters into this viewpoint, we need to regard 
types as first-class values. For example, in a declaration such as 

template<typename X, int Y> 
void func(int i, int j); 

we regard X as a piece of data whose value is a type. Since C++ is statically typed, type 
variables may only have static binding. 

It is useful to think of the type of X as being typename, which can be regarded as a type 
whose possible values span all types. This point of view has a certain simplifying power: 
for example, we can now view typedefs as assignments between typename variables. For 
example, 

typedef float float_type; 

can be regarded as equivalent to the (fictional syntax) 
typename floatjtype = float; 
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2.2 Template instantiation as offline PE 

Partial evaluation of languages which contain binding-time information is called offline 
partial evaluation. Template instantiation resembles offline partial evaluation: the compiler 
takes template code (a two-level language) and evaluates those portions of the template 
which involve template parameters (statically bound values). For example, consider this 
template class: 

template<int X> 
struct foo { 

static const int result = foo<(X 7, 2 == 0) ? (X/2) : (3*X+1) >:: result ; 

}: 

// Base case: X = 1 
templateo 
struct foo<l> { 

static const int result = 0; 

}; 

When f oo<X> is instantiated, the compiler must determine if X % 2 == (i.e. whether X 
is even). If true, it instantiates f oo<X/2>; otherwise f oo<3*X+l> is instantiated. In theory, 
this continues until the compiler hits the base case X=l£| 

3 A simpler syntax: Catat 

We now present preliminary ideas for a single mechanism based on Partial Evaluation which 
unifies generic programming, compile-time computation, and code generation. To illustrate 
the ideas, we introduce a (currently hypothetical) language Catat. Catat is a multi-level 
language based on C++ in which types are first-class values. 

3.1 Binding time specifications 

Each scope in ct Ccttcit program is associated with a default binding time. By default, the 
global scope has dynamic binding. To indicate statically bound variables, an @ symbol is 
appended to the type: 

int i = 0; // Dynamic data 
int@ j = 0; // Static data 

The type int@ is equivalent to const int in C++. 

To preserve consistency between the dynamic and static versions of the language, it is 
necessary to allow multiple levels of binding (or stages). The @ symbol indicates that a 
variable is bound in the previous stage. 

The @ symbol may also be applied to control constructs: 

// Calculate N! (factorial) at compile time 
int(§ N = 5, Nfact = 1; 
for® (int@ 1=1 ; i < N; ++i) 
Nfact *= i; 



1 Whether this recursion terminates for all X is a well-known open problem. In C++, it is impossible to 
determine if a chain of template instantiations will ever terminate. For this reason, compilers place limits 
on the depth of template instantiation chains. 
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Operators such as = and * are applied at compile-time if their operands are statically 
bound. Data may flow from static to dynamic constructs, but not vice- versa. This is called 
cross-stage persistence by Taha and Sheard For example: 

int@ i ; 
int j; 

j = i; // Okay, i is known at runtime 
i = j; // Not okay, j not known at ctime 

3.2 Functions 

Functions in Catat may take a mixture of static and dynamic arguments. We find 
it convenient to give functions two separate parameters lists, as in C++. Here is an 
implementation of the meta_dot function described earlier: 

function dot(int@ N, typenameO T) (T* a, T* b) { 
T result = 0; 

for@ (int@ i=0; i < N; ++i) 

result += a[i]*b[i] ; 
return result ; 

} 

Note how much simpler this definition is than its template metaprogram counterpart 
(Section |1.3| ). 

The function dot may be thought of as a generating extension or higher-order function. 
The concept is easier to express in a functional notation: 

(define dot 

(lambda (static-parms) 
(PE static-parms 

(lambda (dynamic-parms) 
body) ) ) ) 

where (PE parms expr) performs partial evaluation of expr using static parameters parms. 
The use of argument lists of the form (static-parms) (dynamic-parms) hints at this idea, 
and also avoids the parsing difficulty associated with <> brackets in C++. 

Catat discards the return type specification of C++ and replaces it with the keyword 
function. The return type may result from compile-time calculations, and so must be 
inferred from the body of the function. As in C++, we allow static parameters to be 
inferred from dynamic argument types; for example, in the function average, T can be 
inferred from the type of the array argument. 

Functions may be evaluated at either compile-time or run-time. They are not fixed to 
any stage. For example, given this definition of pow: 

function pow(int X, int N) { 
int result = 1; 
for (int i = 0; i < N; ++i) 

result *= X; 
return result; 

} 

One can invoke pow at both run-time and compile-time: 
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int resultl = pow(2,3); // Evaluated at run-time 
int@ result2 = pow@(2,3); // Evaluated at compile -time 

Functions can replace the use of traits classes in C++. Here is a Catat version of 
average_traits (Section |1.4|): 

// Return a type appropriate for averaging an array of T 
function average_type (typename T) { 
switch(T) { 

case int: return float; 

case char: return float; 

case long int: return double; 

// etc. 

default: return T; 

} 

} 

This illustrates the usefulness of regarding types as first-class values. Here is the average 
function, recoded in Catat: 

function average (typename® T) (T* array, int N) { 
typename® T_average = average_type@(T) ; 

// Sum the array, divide by N 

T_average sum = 0; 

for (int i=0; i < N; ++i) 

sum += array [i] ; 
return sum / N; 

} 

3.3 Specialization 

When calls to function templates are encountered during C++ compilation, the template 
is instantiated. In Catat a similar process would occur, which may be called specialization: 
a partial evaluator produces a residual function by evaluating the static constructs. This 
function call: 

int data [10] ; // . . 

float result = average (int) (data, 10) ; 

triggers the partial evaluation of average; the resulting specialization (translated into 
C++) might be 

float average int (int* array, int N) { 

float sum = 0; 

for (int i=0; i < N; ++i) 

sum += array [i] ; 
return sum; 

} 

3.4 Classes 

Classes in Catat may take static parameters, and contain both static and dynamic data 
members. For example: 



9 



1 class Square Array (typename® T_numtype, int@ N_length, int@ N_dim) { 

public : 

SquareArrayO ( ) 

5 { 

// Calculate array size needed 
if® ((N_dim < 1) I I (N_length < 1)) 

Catat_error@("N_dim and N_length must be positive."); 
else® 

10 numElements = pow@(N_length,N_dim) ; 

} 

SquareArrayO 

{ 

15 // Initially set elements to zero 

for (int i=0; i < numElements; ++i) 
data[i] = 0; 

} 

20 private : 

static int@ numElements; 
T_numtype data [numElements] ; 

} 

In this class, there are two constructors: a compile-time constructor SquareArrayO () and 
a runtime constructor SquareArrayO • The SquareArrayO () constructor is invoked during 
compilation when the class is specialized. At this time, it can check the static parameters 
to ensure they are correct; if not, a compile-time error is issued. With the aid of some 
reflection, this may be the right way to enforce constraints on template parameters (an 
idea due to Vandevoorde H). The constructor SquareArrayO is invoked at runtime when 
instances of the class are created. 

As with functions, classes have no specified binding time, but may be instantiated at 
any stage. For example, 

Square Array® (int, 3, 2) x; // Array instantiated at compile-time 
Square Array (int, 3, 2) y; // Array instantiated at run-time 

3.5 Thoughts on compiling Catat 

To implement Catat as described, one apparently needs both a Catat-interpreter and a 
Catat-compiler. For example, to compile a function / which has static parameters s and 
dynamic parameters d, these steps would be needed: 

1. Use the interpreter to partially evaluate / using the static parameters s: 

[interp\ [/, s] = f s 

2. Use the compiler to produce native code for the residual function f s : 



{compiler} f s = f* 
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where * indicates native code.Q This seems wasteful; there would be lots of duplicated effort 
to create both an interpreter and compiler. 

It may be possible to avoid this problem by using an approach similar to that pioneered 
by the Cmix partial evaluation system Q . The basic approach is to use a "closure compiler" 
which uses run-time code generation (RTCG) to compile a single function. RTCG is a bit 
of a misnomer, since the code generation is being done at compile-time by the compiler. 

For example, to evaluate code such as 

int@ result2 = pow@(2,3); // Evaluated at compile-time 

One could use the closure compiler (CC) to compile pow to native code, and then execute 
the function with the arguments (2, 3): 

\CC\pow = pow* 
bow*] [2, 3] = 8 

To evaluate a two- level function such as f (s) (d) , it must first be transformed into a single- 
level function. We suggest the term flattening for this transform. Flattening turns two-level 
code into single-level code, by replacing dynamic code with static code that generates syntax 
trees for the dynamic code. For example, the two-level function 

float pow(int@ N) (float x) 
{ 

float result = 1; 

for® (int@ i = 0; i < N; ++i) 

result *= x; 
return result; 

} 

would be transformed into something like: 

ASTree powgen(int N) 

{ 

ASTree func = make_lambda( . . . ) ; 
ASTree x = make_varref ("x") ; 

func .body () . append(make_vardecl (float , "result", 1)); 
ASTree result = make_varref ("result") ; 

for (int i=0; i < N; ++i) 

func .body () . append (make_op("*=" , result, x)); 

func. body () . append (make_return (result ) ) ; 
return fn; 

} 

With the aid of this flattening transformation, two-level functions can be compiled without 
an interpreter: 



2 The [•] notation is from partial evaluation: if pow is a function, then [pow] [input] is the result of 
executing pow with input. For example, [pow] [3, 2] = 9. 
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f(s)(d) flatten f gen (s) 

lCC}fg en = f* en 

Ifgenh = fs 

iccjfs = f: 

i.e. first the flattening transform is used to turn f(s)(d) into a single-level generator for /, 
called fg en . This generator is compiled using CC into native code fg en . The native-code 
version is then executed with the static parameters s, and produces the specialized version 
f s . This function is then compiled using CC. 

The availability of fast, portable run-time code generation systems such as vcode || 
makes this approach to compilation possible. 



3.6 Interesting possibilities 

Languages with Catat-like abilities raise some interesting possibilities: 

3.6.1 Scripting The partial evaluator for Catat needs to contain what is essentially 
an interpreter to evaluate the static portions of the program. This implies that you get 
scripting for no extra cost; a Catat program consisting solely of static constructs will be 
completely interpreted, with no residual code generated. With a little bit of extra work, it 
ought to be possible to dynamically link to already-compiled Catat programs; this would 
make it possible to "steer" applications using a natural scripting interface. 

3.6.2 Futamura projection Suppose we wrote an interpreter for a domain-specific 
language (DSL) in Catat. We could design our interpreter to take the input text as a static 
parameter, and input variables as dynamic parameters. Residualization of the interpreter 
would result in the DSL code being compiled into the dynamic subset of Catat, via the 
first Futamura projection [||. This approach would allow users to incorporate fragments of 
domain-specific languages into their applications, without sacrificing efficiency. 

3.6.3 Reflection and Meta-level Processing A language like Catat may provide 
a natural environment for implementing reflection and meta-level processing capabilities, 
since the ability to perform compile-time calculations is there already. Such capabilities 
would allow programmers to query objects about their methods and members, determine 
the parameter types of functions, and perhaps even manipulate and generate abstract 
syntax trees. 



4 Related Work 

Nielson and Nielson first investigated two-level languages and showed that binding- 
time analysis can be expressed as a form of type checking. The most closely related work 
is MetaML, a statically typed multi- level language for hand- writing code generators jl8|| . 
MetaML does not appear to address the issue of generic programming. Gluck and J0rgensen 
described a program generator for multi-level specialization || which uses a multi-level 
functional language to represent automatically produced program generators. 

Metalevel processing systems address many of the same problems as Catat; they 
give library writers the ability to directly manipulate abstract syntax trees at compile 
time. Relevant examples are Xroma ||], MPC+- 1- 11], Open C++ Q, and Magik M. 
These systems are not phrased in terms of partial evaluation or two-level languages; code 
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generation is generally done by constructing abstract syntax trees. A more closely related 
system is Catacomb [17], which provides a two- level language for generating runtime library 
code for parallelizing compilers. However, it does not address issues of generic programming. 

The idea of types as first-class values originates in polymorphic or second order typed 
lambda calculus, and languages based on it. 



5 Conclusions 

We have shown that C++ with templates may be regarded as a two-level language in 
which types are first-class values with static binding, and that template instantiation bears 
a striking resemblance to offline partial evaluation. Languages built on this insight may offer 
a way to provide generic programming, code generation, and compile-time computation via 
a single mechanism with simple syntax. 
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